An Efficient Web Search Engine for Noisy Free Information Retrieval

نویسنده

  • Pradeep Sahoo
چکیده

The vast growth, various dynamic and low quality of the World Wide Web makes it very difficult to retrieve relevant information from internet during query search. To resolve this issue, various web mining techniques are being used. The biggest challenge in web mining is to remove noisy data information or unwanted information from the webpage such as banner, video, audio, images, hyperlinks etc. which are not associated to a user query. To overcome these issues, a novel custom search engine is proposed with efficient algorithm in this paper. The proposed Uniform Resource Locator (URL) pattern extractor algorithm will extract the all relevance index pages from the web and ranking the indexes based on user query. Then, Noisy Data Cleaner (NDC) algorithm is applied to remove the unwanted content from the retrieved web pages. The results show that the proposed UPE+NDC algorithm provides very promising results for different datasets with high precision and recall rate in comparison with the existing algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

Similarity Measure Using Link Based Approach

Web search engines provide an efficient interface to vast information. This web search engine provides the most semantic relativity between the given words, and it will generate the semantic measures automatically, since data on the web is noisy, huge and dynamic. we propose and analyzed and visualized similarity relationships in Web data sets to identify how to integrate content and link analy...

متن کامل

WWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data

We describe our experience in developing Web Search Systems using Oracle’s SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the ’web space’ and to provide an efficient search engine for free-text search. The Web enables global access to and maximum informa...

متن کامل

Efficient Clustering Multiple Web Search Engine Results and Ranking

World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. Web search engines with effective and efficient techniques for Web service retrieval and selection becomes an important issue. Existing web search result based on keyword matching in single search engine only. This paper details a modular, self-contained web search results clustering system t...

متن کامل

SIREn: Entity Retrieval System for the Web of Data

We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an “entity retrieval system” specifically designed to meet the requirements of indexing and searching a large amount of semi-structured data, e.g. the entire Web of Data. SIREn supports efficient full text search with semi-structural queries and exhibits a concise index, constant time updates and inherits Information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017